-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to drop partial buckets from date_histogram visuals #19979
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
thanks a lot for your contribution @blfrantz ! As you mention this is something aggregation specific. Adding it to handler.js means it will only work with pie, area, line, bar, heatmap and gauge charts. It won't with all the others (maps, metric, tagcloud, more to come, any 3rd party visualizations). I would suggest implementing this as an aggregation option. The date histogram aggregation config and logic is in The actual conversion of data happens in let me know what you think and if this is helpful. |
Thanks @ppisljar! Sounds like good advice. I like the idea of putting this with the date_histogram aggregation instead of the axis config. I was thinking this was more of an axis display option which led me to my implementation, but yours should be a lot cleaner. I'll work on this and update the PR shortly (doing this in my free time so it may be a few days). |
Hey @ppisljar, now that I've gotten into it I have a question: is it possible to get the time range covered by the query from within the response_handlers? It seems like this information should be knowable since the exact timestamps are specified in the "range" portion of the _msearch query, but I can't find where to get this from the The nice thing about my first approach, ugly as it was, is it had access to this range info and could use the same logic as the "this area contains partial data" tooltip (which incidentally also made it easy to extend that tooltip to say something more relevant in this case). Any suggestions? I do like the genericism of your suggestion, but it's not clear to me where to get this range information. Edit: One other nice thing about editing the data in handler.js and treating this as a view option rather than an aggregation option, is we don't need to requery when toggling this setting. But that's minor. |
@ppisljar, any thoughts? Thanks! |
sorry for late reply. in your response handler you can access the timefilter on preferably you would be able to figure this out of elastic search response, but i am not sure if that is possible. |
No problem, @ppisljar . So in the relative search case, timeRange only contains something like Unless this information is available somewhere I'm not aware of (or not that hard to pass in), I'm beginning to think my original approach may be more realistic. In the same way that the "partial data" tooltip only shows up for some types of visuals (not all where it could be relevant), I'm wondering if having this feature exist only for some of the relevant cases (arguably the ones that matter most) is enough for now? What do you think? If you're ok with that approach, can you provide feedback on my 4th question (about test frameworks)? Thanks! |
let me know if this helps |
@ppisljar thanks for that tip, it worked great. I've updated the PR with the new implementation, let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @blfrantz !
i think this will work much better (specially with our future plans).
i have one more suggestion. Generally i think we are always talking about first and last bucket. Would it make sense to let tabify do its job (convert to table) and then check just some rows and remove those in something like a postprocessing step on tabify ? this way we wouldn't need to pass time information all the way down to TabifyBuckets
and we could keep this code more isolated.
@@ -81,4 +82,20 @@ TabifyBuckets.prototype._orderBucketsAccordingToParams = function (params) { | |||
} | |||
}; | |||
|
|||
TabifyBuckets.prototype._dropPartials = function (params, timeRange) { | |||
if (params.drop_partials && !this.objectMode && this.buckets.length > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than wrapping the whole thing in an if
statement return early:
if (!params.drop_partials || this.objectMode || this.buckets.length <= 1) { return; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe move the if (params.drop_partials)
out of this function ?
also join with below if statement ... and extract to variables so its easier to read:
const isTimeField = params.field.name === timeRange.name;
if (this.objectMode || this.buckets.length <= 1 || !isTimeField) {
return;
}
Thanks @ppisljar. That makes sense, and does make the code organization more elegant/intuitive. However I'm finding that it makes the dropPartials function itself rather more complicated and therefore more brittle. Because of the way the tabified data is structured, things that were simple become a bit complicated in the proposed approach. In the no-split-charts case, the following works (called with
That's not too bad, but some things that could be simple (like determining the interval) get a bit complicated because tabify's rows array contains a separate row for every value on each series, which means there can be multiple rows per time bucket, which means I have to scan the array until I find a new time to determine the interval. Previously, I could just compare the first and second items in the array. Furthermore, the above code doesn't handle the split charts case (whereas the current PR code does), because in that event tableGroup contains tables of tables, and so I'd need to process this recursively or something. That's not a big deal, but as this gets more complex (and computationally expensive) I wanted to see if you thought this was still the best approach. It also seems harder to test. Apologies if I'm missing something. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the implementation as it is in the current PR seems to be way simpler.
i left some nitty picking comments, feel free to ignore them.
@timroes can you also take a look ?
| ||
<icon-tip | ||
position="'right'" | ||
content="'Removes buckets that include times not covered by the Time Range.'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gchaps What would be the best wording we could use here to describe the "Drop partial buckets". Besides the above I thought about something like:
Removes buckets from the beginning and end, that are partially outside the visualization's time range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like this:
Remove buckets that span time outside the time range so the histogram doesn't start and end with incomplete buckets.
@@ -74,7 +75,8 @@ const BasicResponseHandlerProvider = function (Private) { | |||
const tableGroup = aggResponse.tabify(vis.getAggConfig().getResponseAggs(), response, { | |||
canSplit: true, | |||
asAggConfigResults: true, | |||
isHierarchical: vis.isHierarchical() | |||
isHierarchical: vis.isHierarchical(), | |||
timeRange: getTime(vis.indexPattern, vis.filters.timeRange).range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ppisljar Since we talked earlier about removing vis.filters
, should we introduce a new reference here? Do we have an idea how we remove that in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vis.filters
is going away in the long run, but the response handlers will still get the access to the timeRange
(probably being passed in directly as a parameter in the future) so this should not be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blfrantz If you create a visualization without a time field you will get an error for this line because the getTime
will return undefined
if not time field is specified on the selected indexPattern
.
You can verify the error using the shakespeare
dataset https://www.elastic.co/guide/en/kibana/current/tutorial-load-dataset.html and just creating a barchart with that index.
@ppisljar & @timroes: I just force-pushed a cleanup of the commit history to remove all the old iterations. As part of this I added the new tooltip text recommended by @gchaps. I also just rebased onto the latest upstream, fixed a couple of my tests that were broken by some recent upstream changes, and sanity tested that the feature still works as expected. Awaiting any further instructions. Thanks! |
Jenkins, test this |
💔 Build Failed |
@markov00 Thanks for the feedback and tip about that test failure. I've merged and made the requested fixes. I also confirmed that the failing Shakespeare test passes locally now. |
jenkins test this |
💔 Build Failed |
jenkins test this |
💚 Build Succeeded |
jenkins test this, want to be sure it was a flaky CI test failure |
💚 Build Succeeded |
What's the next step here? |
Jenkins, test this - I'll give it a last test run, then will merge it. |
💚 Build Succeeded |
…ic#19979) * Add Drop Partials option to date histogram agg settings UI * Add timeRange to aggOpts and parse in _response_writer * Implement dropPartials method in TabifyBuckets * Fixed a couple issues * Fixed issue with undefined timeRange * Use braces for conditionals
Hi Brian, thanks a lot for your first contribution to Kibana! I just merged your commit on master and it will be backported to 6.x, so that these changes will be released in 6.5. Cheers |
… (#22528) * Add Drop Partials option to date histogram agg settings UI * Add timeRange to aggOpts and parse in _response_writer * Implement dropPartials method in TabifyBuckets * Fixed a couple issues * Fixed issue with undefined timeRange * Use braces for conditionals
Awesome, thanks everyone! |
Pinging @elastic/kibana-app |
We just updated to 6.5.2, and I still don't see the checkbox option, even for a newly created visualization. |
@Benny-Git unfortunately that PR was missing the backport to 6.5 😞 This should be released in a future 6.5 patch release instead, sorry. @gchaps Could you perhaps remove |
@timroes thanks for the very quick response. |
@timroes Done. |
Closes #2806.
Note: description has changed since the original version to reflect a new implementation.
Date Histograms often begin and/or end with incomplete buckets, which makes the beginning/end of time-series charts appear to show steep up/down trends which can be misleading or alarming. This feature adds a new "Drop partial buckets" option for the Date Histogram aggregation. It only appears/applies when the chosen field is the same as the index's Time Filter field (that's the only case where this feature makes sense).
When selected, any buckets which span more time than is covered by the query's Time Range will be removed from the chart.